Statistical analysis

Here are some descriptive statistics and visualizations using both Bayesian and Frequentist models. In the case of the models, one of each a Bayesian and Frequentist mixed effects logistic ordinal regression model was run per language run where the ordered factor rating (1-5) was the outcome variable, and sentence type was the predictor (In Polish and Norwegian: 4 conditions, ditransitive reflexive, ditransitive possessive, monotransitive reflexive, monotransitive possessive; English 4 conditions: monotrasitive possesive, ditransitive possessive, monotransitive using “own”, and ditransitive using “own”. Random intercepts were added per participant in order to account for the nexted structure of the data in both types of model.

Results

The tables below detail the descriptive statistics for each sentence type and the quantity of ratings for each condition.

Polish descriptive ratings
rating acceptable_mono_refl_pol acceptable_di_refl unacceptable_mono_own unacceptable_di_poss
1 0 0 4 4
2 1 4 4 5
3 2 0 3 5
4 3 4 7 3
5 19 18 6 9
English descriptive ratings
rating acceptable_mono_poss acceptable_di_poss unacceptable_mono_own unacceptable_di_own
1 2 1 9 11
2 1 2 8 8
3 6 4 4 4
4 5 8 5 2
5 13 12 1 2
Norwegian descriptive ratings
rating acceptable_mono_refl acceptable_di_refl unacceptable_mono_own unacceptable_di_poss
1 5 8 12 18
2 8 9 5 12
3 8 10 14 11
4 23 17 21 12
5 31 31 23 21

Posterior distributions of Polish ratings per conditions based on Bayesian Ordinal Regression

Visualization of results

The two plots below show the output of a Bayesian Ordinal Logistic regression (in the first plot), and probability curves per condition in the second plot. In plot 1, the points represent the probability (y-axis) and 95% confidence intervals per condition (x-axis). In the second plot, the vertical lines represent each condition, where the individual lines are probabilities of a rating per condition. The closer the vertical lines are together, the closer those conditions are rated in a particular language.

I sent these plots and individual files:

Polish plot 1 = polish_cond_eff.png

Polish plot 2 = polish_prob_curve.png

Norwegian plot 1 = norway_cond_eff.png

Norwegian plot 2 = norwegian_prob_curve.png

English plot 1 = eng_cond_eff.png

English plot 2 = english_prob_curve.png

Looking at the data, it seems like the participants are following similar trends in Polish and Norwegian, in which they accept the conditions that were designed to be acceptable, but show variability in the unacceptable conditions. In English, the participants showed the highest probability of rating the the unacceptable conditions as 1 and accepting the acceptable conditions with 5.

Interpreting the model output to find the specific probabilities per step would take a little more time, since some calculation is necessary based on the model output. Luckily, the package I used (brms), creates the conditional effects plots by default and hand calculation is not necessary. Also, it can be seen in the graphics where the approximate probabilities per step lie, so the specific numbers would help to describe the graphics in prose more than they would help.

Posterior distributions of Norwegian ratings per conditions based on Bayesian Ordinal Regression

Posterior distributions of English ratings per conditions based on Bayesian Ordinal Regression

Plots of conditions between languages